Home > Computers & Technology > Computer Science > Data Modeling & Design

Python Data Mining Quick Start Guide by Nathan Greeneltch

Author:Nathan Greeneltch [Nathan Greeneltch] , Date: July 14, 2023 ,Views: 633

Python Data Mining Quick Start Guide by Nathan Greeneltch

Author:Nathan Greeneltch [Nathan Greeneltch]
Language: eng
Format: epub
Tags: COM018000 - COMPUTERS / Data Processing, COM062000 - COMPUTERS / Data Modeling and Design, COM089000 - COMPUTERS / Data Visualization
Publisher: Packt
Published: 2019-04-24T11:20:04+00:00

PCA

PCA is used to reduce the dimensions of data in an unsupervised manner. The method's goal is to identify new feature vectors, maximize the variance in the data, and then project the original data into this new space. Please revisit the short example in the previous section for an intuitive description.

The new feature vectors that maximize variance are called eigenvectors, and are the principal components. There is one component for each original feature. The power of this method comes when you drop the less important ones and keep only those with the most informative content, thus lowering the dimensions. Scikit-learn has an explained_variance_ attribute that can be used to rank the importance of each principal component. More commonly in data mining, you will use the n_components arg to specify a new, lowered number of dimensions and allow scikit-learn to sort by variance and drop the features automatically.

In the following PCA example, the raw scatter plot of the iris dataset is on the left. The most variation is captured in the direction of the red arrow ("PCA1"), and the runner-up is the orthogonal direction that is captured by the black arrow ("PCA2"). Now imagine rotating the dataset so that the two axes are the first two principal components. Finally, study the PCA scatter plot on the right where the axes are the directions, "PCA1" and "PCA2":

The connection between the right and left scatters should be clear in your mind before you move on from this section. It's this kind of intuition that will allow you to do powerful analysis while also knowing what the underlying mathematics is doing. The methods in this book are not black boxes, and you should force yourself to learn and understand them. You almost certainly do yourself a disservice as a data mining practitioner otherwise.

Download

Python Data Mining Quick Start Guide by Nathan Greeneltch.epub

Copyright Disclaimer:
This site does not store any files on its server. We only index and link to content provided by other sites. Please contact the content providers to delete copyright contents if any and email us, we'll remove relevant links or contents immediately.

Categories

AI & Machine Learning	Bioinformatics
Computer Simulation	Cybernetics
Human-Computer Interaction	Information Theory
Robotics	Systems Analysis & Design

Popular ebooks

Deep Learning with Python by François Chollet(26259)
Algorithms of the Intelligent Web by Haralambos Marmanis;Dmitry Babenko(18334)
Jquery UI in Action : Master the concepts Of Jquery UI: A Step By Step Approach by ANMOL GOYAL(10487)
Test-Driven Development with Java by Alan Mellor(7768)
Data Augmentation with Python by Duc Haba(7641)
Principles of Data Fabric by Sonia Mezzetta(7409)
Learn Blender Simulations the Right Way by Stephen Pearson(7323)
Microservices with Spring Boot 3 and Spring Cloud by Magnus Larsson(7162)
Hadoop in Practice by Alex Holmes(6753)
RPA Solution Architect's Handbook by Sachin Sahgal(6540)
The Infinite Retina by Robert Scoble Irena Cronin(6249)
Big Data Analysis with Python by Ivan Marin(5971)
Life 3.0: Being Human in the Age of Artificial Intelligence by Tegmark Max(5558)
Pretrain Vision and Large Language Models in Python by Emily Webber(4927)
Infrastructure as Code for Beginners by Russ McKendrick(4681)
Functional Programming in JavaScript by Mantyla Dan(4561)
WordPress Plugin Development Cookbook by Yannick Lefebvre(4424)
The Age of Surveillance Capitalism by Shoshana Zuboff(4293)
Embracing Microservices Design by Ovais Mehboob Ahmed Khan Nabil Siddiqui and Timothy Oleson(4180)
Applied Machine Learning for Healthcare and Life Sciences Using AWS by Ujjwal Ratan(4162)